Shaping as a Method for Accelerating Reinforcement Learning
نویسندگان
چکیده
be facilitated by rst learning to solve related simpler problems. The term \shaping" itself has been attributed to the psychologist Skinner 7], who used the technique to train animals such as rats and pigeons to perform complicated sequences of actions for rewards. Skinner describes how the technique is used to train pigeons to peck at a speciic spot: We rst give the bird food when it turns slightly in the direction of the spot from any part of the cage. This increases the frequency of such behavior. We then withhold reinforcement until a slight movement is made toward the spot. ...We continue by reinforcing positions successively closer to the spot, then by reinforcing only when the head is moved slightly forward, and nally only when the beak actually makes contact with the spot... The original probability of the response in its nal form is very low; in some cases it may even be zero. ...By reinforcing a series of successive approximations, we bring a rare response to a very high probability in a short time. ...The total act of turning toward the spot from any point in the box, walking toward it, raising the head, and striking the spot may seem to be a functionally coherent unit of behavior; but it is constructed by a continual process of diierential reinforcement from undif-ferentiated behavior, just as the sculptor shapes his gure from a lump of clay. (Skinner 7] pp. 92-93) The phrase \...reinforcing a series of successive approximations..." expresses the essence of shaping. Given the task of training an animal to produce complex behavior, the trainer has to be able to (1) judge what constitutes an approximation to, or a component of, the target behavior, and (2) determine how to diierentially reinforce successive approximations so that the animal easily learns the target behavior. Unfortunately, neither of these two components of shaping have been formalized rigorously in the psychology literature, even though shaping is widely used both in psychological studies and to train pets and circus animals. Staddon 8], for example, observes that the trainer often has to rely on an intuitive understanding of the way the animal's behavior is generated when determining which behavioral variations are precursors to the target behavior and how to reinforce these precursors. Variations in the behavior of individual animals also must be accounted for when making these judgements. mass of the pole, shortening the pole, and …
منابع مشابه
Reward Shaping by Demonstration
Potential-based reward shaping is a theoretically sound way of incorporating prior knowledge in a reinforcement learning setting. While providing flexibility for choosing the potential function, under certain conditions this method guarantees the convergence of the final policy, regardless of the properties of the potential function. However, this flexibility of choice may cause confusion when ...
متن کاملReward Shaping in Episodic Reinforcement Learning
Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of reinforcement learning in various sectors, such as healthcare and cyber-security, among others. However, reinforcement learning can be time-consuming be...
متن کاملAbstract MDP Reward Shaping for Multi-Agent Reinforcement Learning
MDP Reward Shaping for Multi-Agent Reinforcement Learning Kyriakos Efthymiadis, Sam Devlin and Daniel Kudenko Department of Computer Science, The University of York, UK Abstract. Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be give...
متن کاملThe Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping
Shaping can be an effective method for improving the learning rate in reinforcement systems. Previously, shaping has been heuristically motivated and implemented. We provide a formal structure with which to interpret the improvement afforded by shaping rewards. Central to our model is the idea of a reward horizon, which focuses exploration on an MDP's critical region, a subset of states with th...
متن کاملPotential-Based Shaping and Q-Value Initialization are Equivalent
Shaping has proven to be a powerful but precarious means of improving reinforcement learning performance. Ng, Harada, and Russell (1999) proposed the potential-based shaping algorithm for adding shaping rewards in a way that guarantees the learner will learn optimal behavior. In this note, we prove certain similarities between this shaping algorithm and the initialization step required for seve...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992